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coupled with common errors in assessing the proportionate difference between 
groups, mean that significant public money has been spent attempting to 
overcome problems that may not exist. When underachievement is understood to 
mean a lower achievement level by an individual (or group) than would be 
expected using a model based on the best available predictors, there is 
nothing that can be known about underachieving individuals (or groups) that 
they have in common. They cannot be disproportionately working class males, 
for example, because class and sex would then be part of the best available 
predictors. Even if some predictors were reserved from the best model, there 
is no evidence that underachievers have much in common. In raw score terms, 
it can be said that a particular social group exhibits lower achievement than 
another. It can also be said that there Is a differential attainment between 
groups. However, this is not saying that the lower attaining group could and 
should do better on that assessment, or that the surface dissimilarity is the 
cause of differences in attainment. Making explicit what is meant by 
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Abstract 



This paper formed the basis for our oral and written presentations to the House of Commons Select 
Committee on Education and Skills on the topic of underachievement. ‘Underachievement’ is now a 
widely used term in education policy and practice. It is used routinely to refer to nations, home nations 
and regions, to types and sectors of schooling, to physiological, ethnic and social groups, and to 
individuals. It has been used to mean simply low achievement, also lower achievement relative to 
another of these groups, and also lower achievement than would be expected by an observer. The 
paper presents examples of each. These multiple uses lead to considerable confusion which, coupled 
with common errors in assessing the proportionate difference between groups, mean that significant 
public money has been spent attempting to overcome problems that may not, indeed, exist. Where 
underachievement is understood to mean a lower level of achievement by an individual (or group) than 
would be expected using a model based on the best available predictors, then there is nothing we can 
know about underachieving individuals (or groups) that they have in common. They can not be 
disproportionately working-class males, for example, because class and sex would then be part of the 
‘best available predictors’. Even if, instead, we reserve some predictors from our best model, there is 
no evidence that underachievers have much in common (and examples &om such models are presented 
in the paper). In raw-score terms, we might say that a particular social group exhibits lower achievement 
(in the sense of publicly available figures relating to pencil and paper tests) than another, as in the case of 
some ethnic groups. Or we might say that there is differential attainment between groups, as in the case 
of males and females. This is very far fix)m saying that the lowerrattaining group could and should do 
better on that assessment, or that the surface dissimilarity (such as ethnicity or sex) is in any way the 
cause of the difference in attainment Making e^q^licit what we mean by underachievement is an 
important step towards accepting that, collectively, we do not really mean anything by it. 
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Summary 



• ‘Achievement’ at school generally describes levels of attainment in public examinations such as 
GCSE. 

• The validity of public examinations as measures of achievement is not perfect. The generalisability of 
pencil and paper tests to real-life tasks can be rather low. 

• Public examinations are not wholly reliable. Therefore, small differences between levels of 
attainment cannot be attributed solely to real differences in achievement 

® Fair and rigorous comparisons cannot be made between different forms of attainment. 
Comparability is reduced by differences over time, place, exam board, mode of examining, subject 
and syllabus. 

• Gaps in attainment cannot be calculated by simple subtraction. They must be proportionate, 
contextualised, and hedged around with doubts about the underlying distribution of the scores. 

• ‘Underachievement’ is used to describe a range of phenomena. These range from the differential 
attainment of groiq)s of school students (such as those formed by nation, region, ethnicity, language, 
school type, sex and social class) to the failure of an individual student to attain a level equivalent to 
the best prediction of their future performance (value-added or contextualised). 

• Once operationalised, there is no convincing evidence for any of these forms of underachievement 

• There are problems of unreliability and invalidity in the categories frequently used to define groups of 
underachievers (such as social class and ethnicity). As the unreliability of attainment measures and 
classifying variables increases so does the chance of spurious ‘effects’. 

• In the UK, there is an absence of appropriate experiments to assess the reasons why some groups 
do less well in compulsory schooling. Only experimental designs can test causal models leading to 
fruitful ameliorative action. Filling this gap was a primary purpose of the thirty million pounds spent 
on the ESRC-controlled Teaching and Learning Research Programme. 

• Given this lacuna, we are left with post hoc analyses of large datasets seeking cause by statistical 
manipulation, and small-scale studies of ‘qualitative’ data often not seeking causes at all. Both 
approaches have significant defects. This paper focuses on the former approach, but the problems 
generally encountered in the latter approach are even greater in terms of rigour, generalisability and 
comprehensibility. 

• There is no reason to assume that achievement in the UK is worse than in comparable nations. Nor 
is there any evidence for the much-cited notion that results in the UK are more polarised. 

• There is no reason to assume that achievement in different parts of the UK, or in different types of 
schools, is different for equivalent students. 

• There is no reason to assume that achievement differs between social groups, as defined by 
ethnicity, social class, language or sex (for otherwise equivalent students). 

• The differences in raw-score attainment in the above groups disappear in either a value-added or a 
contextualised analysis. 

• There is some evidence that achievement in state-funded schools is improving over time, and that, 
contrary to popular reports, the gaps in attainment between ideiitifiable groups are declining. 

• Much public money is being spent on research that cannot produce the answers required of it, and 
on policies to ameliorate growing gaps in attainment that do not exist 
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There is insufficient space here to argue each of the above closely with Ml supporting evidence. Instead 
the outline below uses references to published peer-reviewed material available upon request to 
siqjplement the examples of research given. 



1. Examinations and comparability 

1.1 There have long been con^laints that standards of attainment in UK education have fallen over time 
(Cresswell and Gubb 1990, National Commission on Education 1993, Barber 1996), that they are 
poor in comparison to similar countries (Boy son 1975, Prais 1990, Skills and Enterprise Network 
1999), and that standards are particularly poor for the lowest achievers ^ostlethwaite 1985, Bentley 
1998, DIES 2001). Therefore, the UK is supposed to have a uniquely polarised assessment system, 
with excellent results for some and a long tail of underachievers. Claims such as these are quite 
common, and contribute to what has become a ‘crisis account’ of the state of the UK education system 
and its schools (Gorard 2000a). 

1.2 However, judging standards is difficult without having a close definition of the term 'standard'. As an 
illustration of how elastic the term can be, consider the very real situation in which an educational 
attainment indicator such as a GCSE becomes more common over a period of ten years. One group of 
comentators may claim that standards have therefore improved, because more students now attain the 
GCSE standard. Their opponents may claim that standards have fallen, since the GCSE is now 
demonstrably easier to obtain and also worth less in exchange. The point to be made here is that 
knowledge is not a static commodity, and comparisons of changes over time in school attainment have 
to try and take these changes into account One analogy for the complaint by the National Commission 
on Education (1993) that number skills have deteriorated for 1 1-15 year olds, would be the clear drop 
over the last millennium in archery standards among the general population. If the number of children 
knowing the meaning of this word 'mannequin' drops fiom 1950s to the 1970s is this evidence of some 
kind of decline in schooling? Perhaps it is simply evidence that words and number skills have changed in 
their everyday relevance. On the other hand, if the items in any test are changed to reflect these changes 
in society, then how do we know that the test is of the same level of difficulty as its predecessor? In 
public examinations, by and large, we have until now relied on norm-referencing. That is, two tests are 
declared equivalent in difficulty if the same proportion of matched candidates obtain each graded result 
on both tests. The assumption is made that the actual standards of each annual cohort are equivalent, 
and it is these that are used to benchmark the assessment. How then can we measure changes in 
standards over time (for there cannot be any, by definition)? But, if the test is not norm-referenced how 
can we tell that apparent changes over time are not singly evidence of differentially demanding tests? 
This apparently insuperable problem has, to my mind, not been adequately addressed (Gorard 2001a). 

1.3 Britain uses different regional authorities (local examination boards) to examine what are meant to 
be national assessments at 16+ and 18+ (Noah and Eckstein 1992). It is clear that even qualifications 
with the same name (e.g. GCSE Histoiy) are not equivalent in terms of subject content as each board 
sets its own syllabus. Nor are they equivalent in the form of assessment, or the weighting between 
components such as coursework and multiple-choice. Nor is there any evidence that the different 
subjects added together to form aggregate benchmarks are equivalent in difficulty to each other. In fact, 
comparability can be considered between boards in any subject, the years in a subject/board 
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combination, the subjects in one board, and the alternative syllabuses in any board and subject. All of 
these are very difficult to determine, especially as exams are neither totally accurate nor reliable in what 
they measure (Nuttall 1979). The system of statutory assessment is also producing a flood of complaints 
about irregularities and inconsistencies (Cassidy 1999). Pencil-and-paper tests can have little 
generalisable validity, and their link to other measures such as occupational competence is generally very 
small (Nuttall 1987). 

1.4 The problems faced by researchers in international studies of student performance are even greater. 
These include the comparability of different assessments, the comparability of the same assessments 
over time, using examinations or tests as indicators of performance at all, the different curricula in 
different countries, the different standards of record-keeping in different countries, and the 
competitiveness (especially) of developing countries (see O'Malley 1998). Yet what international 
comparisons seek to do is solve not one but all of these problems at once (Gorard 2000b). 

1.5 A further problem is that simple differences between attainment scores are being routinely 
misrepresented by academics, policy-makers and the media, in a way that takes no account of their 
underlying distribution or their base rate (Gorard 1999, Gorard and Taylor 2002a). 

1.6 In summary, it is extremely difficult to claim that small differences in ‘surfece’ attainment between 
students represent real differences in achievement. 



2. Underachievement 

‘Many pupils underachieve during the years of con 5 )ulsory education, especially in Wales’ 
(ETAG, 1998, p.27). 

‘Today's underachieving boy is tomorrow's unemployed youth. He is public burden number 
one, needing benefit in the world of global competition where governments want to get taxes 
down’ 

(Mahony, cited in Dean 1998) 

‘To overcome economic and social disadvantage and to make equality of opportunity a reality, 
we must strive to eliminate and never excuse underachievement in the most deprived parts of 
our country’ 

(DfEE, 1997, p.3) 

‘West Indian children as a group are underachieving in our education system and this should be 
a matter of deep concern not only to all those involved in education but also the whole 
community’ 

(DES 1985, p.3). 

2.1 Here are four statements from four different sources being used to describe four groups of 
‘underachievers’ - a nation, a gender, a social group and an ethnic groiq). Underachievement has been 
described as the ‘predominant discourse’ in education in recent times (Weiner et al 1997, p.620). A 
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‘crisis account’ of the state of our schools seems to permeate our society. Government policy is being 
made to counter not only the social consequences of underachievement - crirniiial behaviour, social 
exclusion, unsuccessful relationships and marriages (Bentley 1998), but also its economic implications 
for the global competitiveness of nations whose education systems are increasingly tied to the economy 
(Istance and Rees 1994, Docking 2000). Hardly a week passes without an article in the Times 
Educational Supplement describing the attempts of schools up and down the country to eliminate the 
‘underachievement’ of a certain group of pupils. The list of related initiatives is considerable: homework 
clubs, school trips, ICT programmes to get fathers more involved with their sons’ education, mentoring 
schemes and so on (Lawrence et al. 1997, Learner 2001, Wallace 2000). 

2.2 The term ‘underachievement’ has been used by politicians, journalists, academics and practitioners 
to describe relatively poor academic performance, from a nation to an individual but a review of the 
literature suggests that a consensus on its definition and measurement is hard to come by porard 
2000c). One of the problems with the notion of underachievement is, quite simply, in understanding 
what the underachievement is in relation to. Is it related to some kind of innate ability on the part of the 
individual or is it achievement relative to that of a larger group? In this latter case a more apposite term 
might be ‘low achievement’ or, more generally, differential achievement. ‘Underachievement’ is used 
routinely to refer to nations, home nations and regions, to types and sectors of schooling, to 
physiological, ethnic and social groups, and to individuals. It has been used to mean simply low 
achievement, also lower achievement relative to another of these groups, and lower achievement than 
would be expected by an observer. These multiple uses lead to considerable confusion which, coupled 
with common errors in assessing the proportionate difference between groups, mean that significant 
public money has been spent attempting to overcome problems that may not, indeed, exist (Gorard et 
al.2001). 

2.3 Much previous work on defining and measuring underachievement has relied on what could be 
termed the ‘psychologist’s’ definition of underachievement. That is ‘school performance, usually 
measured by grades that is substantially below what would be predicted on the basis of the student’s 
mental ability, typically measured by intelligence or standardised academic tests’ ^cCall et al. 1992, 
p.54). However, the problem with adopting this method is that it does not take into account other 
factors that are widely acknowledged to contribute to academic achievement, such as social class and 
pupil attitudes towards school. Neither does this method fiiUy compensate for errors in the design and 
measurement of commonly used standardised ability tests and school examinations. Alternatively, we 
could broaden the definition of the term underachievement as ‘achievement falling below what would be 
forecast from our most informed and accurate prediction, based on a team of predictor variables’ 
(Thorndike 1963, p. 19). 

2.4 Where underachievement is understood to mean a lower level of achievement by an individual (or 
group) than would be expected using a model based on the best available predictors, then the 
underachieving individuals must have nothing in common (else that common factor would become part 
of the best prediction). If, instead, we reserve some predictors from our best model (sex or poverty, for 
example), we still find no evidence that underachievers have much in common (Smith 2002). In raw- 
score terms, we might say that a particular social group exhibits lower achievement (in the sense of 
publicly available figures relating to pencil and paper tests) than another, as in the case of some ethnic 
groups. Or we might say that there is differential attainment between groups, as in the case of males and 
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females. This is very far fiom saying that the lower-attaining group could and should do better on that 
assessment. The term underachievement has conceptual and practical difficulties, which chiefly lie in 
determiriing what the ‘undef is in relation to. When it is used in relation to peers, or prior attainment, or 
cognitive aptitude tests for example there is no clear way of separating it fix)m errors in the baseline 
testing system. To assume, as the DfES and many researchers in this field appear to, that the assessment 
system is neutral (by sex, for example) and that any differential is related to achievement or performance 
seems peculiarly naive. This is especially so in the light of the lack of complete reliability in statutory 
assessments. Making explicit what we mean by underachievement is an important step towards 
accepting that, collectively, we may not really mean anything by it. 

2.5 The nature of formal assessments means that comparing standards over time (or between groups) is 
very difficult If the same test is administered repeatedly year-on-year, so that we can assume the same 
level of difficulty over time, then there are potential practice effects. Any increase in test results could be 
due to familiarity with the test; On the other hand, where the test is changed every year to keep it up-to- 
date and prevent practice effects, then we have no way of knowing whether successive tests are of the 
same standard. Until 1987 this problem was largely overcome in public examinations by ‘norm- 
referencing’. An assumption was made that the test cohort every year was of the same ability, but that 
the test varied. So, instead of having a pass mark the test had a set pass proportion. For example, in O- 
level English perhaps 10% of the entry cohort were given the top grade every year. So, by definition, it 
was impossible to ask whether standards were rising year-on-year. The underlying assumption of exam 
marking was that standards did not change. The only change allowable was in the proportion of the age 
cohort entering any examination. Since 1987 the UK has moved to a system based largely on criterion 
referencing. Now, each grade is related to a description of what is required, and if the candidate gives 
evidence of this then the grade is awarded. Since 1987, therefore, standards have been allowed to vary. 
This has led to an annual increase in exam scores, but has also made it impossible to tell whether this is 
due to rising standards of candidates or a lowering of the standards of tests. In the absence of a valid 
independent benchmark, any discussion of relative educational standards in the UK is somewhat 
pointless. 

National achievement 

2.6 Similar problems arise when trying to compare results between countries. Here, the problems of 
different entry rates and different standardisation procedures are compounded by the different 
assessment systems, and even by differences in the educational systems (and, of course, the curricula) 
themselves. Where the same test is administered in each country (as in the Third International 
Mathematics and Science Study), re-consideration of the results shows that there is no convincing 
evidence of ‘underachievement’ in the UK UK scores are compared with countries like: the US which 
has much fuller coverage of the curriculum underlying the test; Singapore where children do not advance 
through school years automatically (meaning that they are, on average, 6 months older than UK students 
in TIMSS); and even Thailand whose scores are based only on the 32% of the age cohort attending 
school. Where a different test is used for each country (perhaps more appropriate to the local 
curriculum), then problems of comparability arise. How can we tell whether the baccalaureate in France 
or the arbitur in Germany are equivalent in difficulty to the GCSE in the UK? 
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2.7 Anyway, sixteenth place for England in TIMMS (Mathematics) is far from impressive, but better 
than several countries including USA, Norway and Spain. Many of the other countries taking part also 
scored lower, but were omitted by the researchers from analysis as they did not meet the sampling 
requirements for the study. In this study of the attainment of 14 year-olds, one South American country 
submitted scores for a cohort averaging 16 years of age. Otherwise, the oldest average age is for 
Singapore at the top of the table in terms of score, and the youngest is for Iceland near the bottom. The 
linear correlation between age and score means that one would expect countries with older children in 
the test to have higher scores, and that nearly 30% of the variance in outcomes is explicable by 
differences in mean age alone (Gorard 2000b). There are further problems with the study in terms of 
sampling, low response rate (below 50% for England, Keys et al. 1996), exclusion of students with 
special educational needs, overlap of standard errors, and motivation. Brown (1998) concludes that the 
information in international league tables is generally too flawed to be of any use at all. 

School achievement 

2.8 At the level of comparison between schools (department or teachers), school effectiveness work 
has attempted to describe the characteristics of a successful school in a way that could form the basis of 
a blueprint for school improvement. Ironically, the major undisputed outcome of all of this work has 
been the reinforcement of the importance of non-school context (Coleman et al. 1966, Gray and Wilcox 
1995). National systems, school sectors, schools, departments and teachers combined have been found 
to explain approximately zero to 20% of the total variance in school outcomes. In all studies this ‘effect’ 
is small, and the larger the sample used, the weaker is the evidence of any effect at all (Shipman 1997) - 
and, of course, we could not be certain that it is an ‘effect’ since the underlying causal model remains 
opaque. The remainder of the variance in outcomes is explained by student background, prior 
attainment and error components. Work by Tymms (2003) has shown that the size of school effects is 
inversely related to the reliability of the measurements involved This raises the intriguing possibility that 
school (and other) effects are simply a product of the unreliability inherent in the assessment system. 
When researchers have attempted to relate this small school-effect to school characteristics and 
processes, so producing a blueprint for school improvement, the results have generally been negligible. 
The factors making up a 'good' school are frequently nebulous (Ouston 1998) or tautological (Hamilton 
1997). 

2.9 Where claims have been made regarding the superiority of schools in one or more home countries 
of the UK, the situation is somewhat easier to assess as the systems themselves are more similar. While 
the countries have very similar school systems, Wales, for example, has until recently produced lower 
exam scores at all levels than England. However, once levels of poverty have been taken into account, 
schools in Wales have produced results that are as least as good as those in England (Gorard 1998a). 
Similar points can be made about differences between types of schools within one home country 
(Gorard 1998b). To expect a school with many students in poverty to gain the same kind of exam 
success as a school with nearly no poor students at all, is ridiculous. Yet this is what raw-score 
comparisons (such as league tables) do. Once levels of poverty, and other background factors, are 
taken into account in regression equations then there is no evidence that any type of school performs 
any better than any other. State-funded schools in the UK are also rapidly catching up with the exam 
scores of fee-paying schools (Gorard and Taylor 2002b). So the question is not about the 
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underachievement of schools or regions. Rather it is why there is this link between poverty and 
attainment, and what can be done about it. 

2.10 Once their context is taken into account, there appear to be better and worse performing schools 
of all types and in all sectors. However, the overwhelming majority of variance in school results is 
predicted by the nature (or prior attainment) of the intake. Little variance is left to be labelled a 'school 
effecf, and even this contains an error component of unknown size. Put another way, there is no clear 
evidence of schools having much systematic effect at all on the attainment of their students. It appears 
that each individual would achieve pretty much as they do in any school, and that school 'improvemenf 
consists largely of admitting more high achieving students - whether through direct selection as in some 
specialist and all grammar schools, or indirectly via the admissions systems, as in faith-based and 
Foundation schools. 

2. 1 1 In summary, once the issues discussed in section 1 are taken on board it is difficult to conclude that 
levels of attainment in the UK are poor, falling, or weak in comparison to other countries. It is difficult to 
conclude that any one sector, or type of school, is weaker than another. It is not possible to identify 
entire groups of students with a tendency to underachieve. It is possible to identify groups which attain 
lower scores - but the category which binds them together (such as sex or social class) is merely a 
‘pseudo-explanation’ for their lower achievement (see below). There is some evidence that standards of 
attainment are in^roving over time. 



3. Achievement gaps 

3.1 This section examines patterns of attainment polarisation in England at a variety of levels. The PISA 
study in 2000 involved all EU countries. National segregation by examination outcome (for reading - the 
only score with complete coverage) is largely explicable by the use of academic (and other forms of) 
selection (Smith and Gorard 2002a). In all countries there are small gaps between the performance of 
boys and girls in reading - in favour of girls. This gap is generally smaller in countries with the highest 
overall scores. Overall, the Scandinavian countries of Sweden, Finland and Denmark show less 
segregation on aft indicators. The UK has below average segregation in terms of all indicators, despite a 
commonly held but unfounded view that segregation in the UK is among the worst in the world 

3.2 Table 1 presents the results for reading performance according to the students’ score on the PISA 
indicator of wealth (Smith and Gorard 2002b). Students who fall into the lowest 10% by wealth 
generally perform less well on the reading tests. In general, countries with the lowest gap in reading 
performance between richest and poorest are also those that have relatively high scores, even for the 
poorest 10%. Finland, Ireland and the Netherlands have high scores for both groups, while France, 
Germany and Luxembourg with heavily selective systems have both very low scores for the poorest 
10% and only average scores for the richest 90%. The UK has the fourth highest score for the poorest 
10% and the third highest score for the richest 90%. In fact, the scores in the UK are so far fiom 
polarised that the reading score for the lowest 10% is higher than the overall score for most countries. 
There is no evidence here of the purported crisis of underachievement in UK education. However, all of 
the foregoing caveats also apply to these figures. 



Table 1 - Mean reading score according to PISA indicator of family wealth 



Country 


Poorest 

10% 


Richest 

90% 


Luxembourg 


385 


452 


Portugal 


422 


483 


Germany 


454 


504 


Greece 


456 


475 


France 


465 


509 


Spain 


469 


499 


Italy 


472 


492 


Austria 


477 


502 


Denmark 


479 


502 


Belgium 


489 


519 


Sweden 


495 


519 


UK 


502 


529 


Ireland 


512 


530 


Finland 


540 


550 


Netherlands 


541 


543 



Gaps between groups 

3.3 Policy-makers, media commentators, and academics have recendy worked together to create a 
’moral panic’ about the underachievement of boys at school (see for example, DENI 1997, Dean 

1998) . Although each account may have minor variations, the dominant version is as follows. There was 
a fairly recent period when boys were out-performing, or at least out-scoring, girls at school. Then girls 
began to catch up in terms of school performance and qualifications. They have now overtaken the 
boys, and the gap between the genders is increasing over time. Boys are prevalent in terms of school 
failure, non-qualification, exclusion and special needs. This is a universal phenomenon unrelated to local 
socio-economic considerations. Boys are therefore underachieving (see Salisbury et al. 1999 for a fuller 
account of this literature). Since this much is apparentiy clear the next task is to overcome the 
disadvantage of boys by remedial action in schools. This task is being attempted by multiple action 
research projects (e.g. School of Education 1998) or by attempting to transfer strategies fi-om schools 
presumed to show good practice because they have a lower gender gap in attainment than their peers 
(as in the DfES project on ’boys underachievement’ based in Cambridge). 

3.4 In fact, very littie of this dominant account has any validity. The confusion in this field can be seen in 
the fact that as late as 1997, some respected writers in this field still believed that boys were outscoring 
girls at GCSE (e.g. David et al. 1997), but that there was ’a closing gender performance gap in most 
subjects in GCSE (p.99) with ’girls closing the gender gap’ (p.l02). Recent re-analyses of the national 
figures for attainment &om Key Stage 1 to A level have shown that the gaps between girls and boys 
have remained the same since the early 1990s, perhaps even declining slightly over time (Gorard et al. 

1999) . Where achievement gaps exist (and of the core subjects these only consistently appear in 
English, and Welsh in Wales), they are at the highest levels of attainment, just as they are for gaps 
between the achievement of ethnic groups (Johnston and Viadero 2000). The nature and size of these 
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gaps vary regionally, and are clearly related to socio-economic factors. In fact, once the complexity of 
factors and obstacles such as home background, school structure, and social skills are taken into 
account a simple gendered explanation of achievement does not work (Kutnick 2000). Nor, apparently, 
do the simplistic solutions being suggested to the problem, such as single-sex teaching (see Harker 
2000). According to the best records we have boys have not attained higher grades (at 16+) than girls 
for at least 25 years. In fact, it is not even clear that we have any reliable evidence that boys have ever 
done better than girls in compulsory schooling. 



3.5 There is currently no sizeable or consistent gender gap at the lowest level of attainment in any public 
examination for any subject for any Key Stage. Approximately the same proportions of boys and girls 
of the relevant age gain at least the lowest level of each qualification (such as Level 1 at Key Stage 
One). In addition, for Mathematics and Science (and a few other curriculum areas) there is no sizeable 
or consistent gender gap at any level of attainment. Put another way, the assessment system is largely 
gender-neutral. There are achievement gaps in several curriculum areas, most notably English, other 
languages, and humanities. Where these appear, they are greatest at the highest level of attainment, 
mostly affecting a minority of (the most able) children (Table 2). These gaps are not increasing over 
time. The gaps in some subjects remain relatively static, while some are declining slightly. It is also worth 
noting that in subjects where children are assessed both by teachers and by a task/test, then the task/test 
produces lower achievement gaps (i.e. it is more gender neutral). 



Table 2 - Achievement gap in favour of girls: GCSE English 





Entry 


A* 


A 


B 


C 


D 


E 


F 


G 


1992 


20 




27 


23 


16 


10 


5 


1 


0 


1993 


20 




31 


24 


16 


10 


5 


2 


0 


1994 


30 


43 


34 


27 


18 


11 


5 


1 


0 


1995 


10 


44 


35 


24 


16 


8 


4 


1 


0 


1996 


10 


43 


36 


25 


16 


9 


4 


1 


0 


1997 


20 


43 


35 


25 


15 


9 


5 


2 


1 



[table entries represent the extent to which girls outnumber boys in each cell. 20 as an entry gap shows 
that 20% more girls sit the assessment. 16 as a gap at C grade shows that 16% more girls attain a C or 
above] 



3.6 Figure 1 shows that there are year-on-year fluctuations in the overall achievement gap at the 5 A*- 
C GCSSE level (in favour of girls), but that girls have never scored lower than boys since 1974 (using 
the same kind of figures as in the table above). It also suggests that until 1987/88 the overall trend of the 
gap was relatively static with a low in 1978 and high in 1983. Just at the period when overall scores 
begin to rise, there is a sudden jump in the size of the achievement gap over a two year period until the 
gap stabilises again fiom 1988/89 to 1998/99. This could explain the perplexing, to some, finding that 
fi-om 1992 to 1997 the gender gap in a subject-by-subject analysis remained constant (Gorard 2001b). 
In summary, the gender gap at GCSE is chiefly a phenomenon appearing between 1987 and 1989, and 
growing only during that same period This information could be key to our understanding of the 
determinants of this gap. 



Figure 1 - Achievement gap in favour of girls attaining 5+ GCSE A*-C 



ERIC 
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3.7 The differential attainment of boys and girls at 16+ has appeared over a relatively brief period since 
1987, concurrent with major increases in qualification levels for the entire 16-year-old cohort. The 
introduction of the GCSE heralded several other major changes including the abolition of strict nonn- 
referencing at O-level which had previously worked to maintain results at a relatively constant level 
(Foxman 1997). This was linked to the largest ever annual increase in the proportion of those reaching 
the GCSE (or O-level equivalent) benchmark in 1988, and the second largest in 1989. In addition, the 
publication of the results for the 16-year-old cohort replaced the previous School Leavers Survey 
(which had included results from children of other age-groups) and formed the basis for new school 
performance tables. It is surely no coincidence that the gender gap appeared at precisely the same time 
as these changes, along with the introduction of course-woik assessment and the onset of the National 
Cuniculum with SATs, which according to evidence from the Youth Cohort Study have all greatly 
increased the chances of success for those from 'poor^ backgrounds (Dolton et al. 1999). 

3.8 The potential practical importance of such a basic finding cannot be over-estimated. Given that the 
gender gap is, in fact, related to both social class and levels attainment, then the appearance of a large 
gap just when children from poorer families began to score more highly begins to suggest possible 
explanations. Consider the following as one example of an implication for ameliorative strategies. If the 
notion of what constitutes 'work' and what is appropriate for home varies by occupational class, it may 
be that ’working-class' men, and their boys, do not bring woik home, whereas ’working-class' women, 
and their girls, do. If so, strategies such as homework clubs or Saturday sessions at school may be more 
natural and therefore effective for such boys than homework pacts. Another possible conclusion to be 
drawn from this would be that differential attainment by sex is a product of the changed system and 
nature of assessments rather than any more general failing of boys, their ability, application, or the 



competence of those who teach them. Such a conclusion, that differences are highly dependent on the 
nature of assessment, would be supported by the recent debate over the apparent improvement in boys' 
literacy as a result of the literacy hour where sensitivity to the precise nature of the test appeared to 
determine the nature of the gender gap (Cassidy 2000), and by the finding that achievement gaps can 
vary considerably depending on whether the assessment is by teacher or task/test. 

3.9 Similar findings apply, where data are available, to differential attainment by ethnic groi^) and by 
economic region. It is not clear why differences between ethnic groups, regions, and sexes occupy so 
much commentator attention. The gaps between other social groiq)s, such as by first language or 
between rich and poor, are much larger than the gender gap. Perhaps the biggest single gap is between 
the high and low achievers. The achievement gaps between the top and bottom 10% are very large, and 
completely dwarf any differences between boys and girls. However, these gaps are also inherent in the 
nature of the assessment system. A system that did not differentiate at all would be dismissed as useless. 
But these results show it is clearly possible to change the assessment system to reduce the gap in 
‘surface’ attainment between any groiq)s in the same way that ‘neutral’ IQ tests were developed, if that 
is what is desired. 

3.10 Although the methods used here allow fair comparisons over time and place, there is no method 
suitable for comparing gaps in tests scores between different age groups. It is impossible, for example, 
to decide whether the gender gap is larger, smaller or the same at Key Stage Four as it is at Key Stage 
Two (although this does not prevent commentators fiom making spurious comparisons based on 
expected levels). The metrics are not equivalent. However, the gender gap in qualifications, such as it is, 
reverses among adults in later life. 



4. Individual level achievement 

4.1 Operationalising the concept of underachievement is key to appreciating which group of students 
succeeds at school and also in understanding the confusion between low achievement and 
underachievement. A recent study used detailed student-level data to measure and identify 
underachievement among a group of over 2000 year 9 secondary school students (Smith 2002). Over 
30 variables which the academic literature cite as being linked to academic performance (such as prior 
attainment, attitude towards school and receipt of fiee school meals) were used to predict the future 
examination performance of these students. Any individuals who failed to fulfil their potential (i.e. were 
more than one standard deviation away) were considered to be underachieving. There was little to 
distinguish the underachieving students fiom their peers. While there were some woridng class boys who 
underachieved, for example, using this definition, there were others who overachieved Indeed, students 
in the underachieving group came fiom across the ability range; therefore it was possible to have a high 
ability underachiever as well as a low ability underachiever. The best predictors of academic success 
were prior attainment and attendance at school (accounting for three quarters of the variation in 
examination outcome), with sex and social class accounting for a negligible amount of the variance. 
Students who came fiom more economically disadvantaged backgrounds, performed less well in the 
Key Stages 2 and 3 examinations in every subject, as well as being less regular attenders at school - 
disadvantages which far exceeded those between the sexes. However, these students were not 
disproportionately w«d/erachieving in terms of the model. 
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4.2 Where regression has been used elsewhere in the identification of underachievers, it has relied 
almost exclusively on the school performance / mental ability test discrepancy described earlier (Lau and 
Chan 2001, Tuss et al. 1995, Whitmore 1980). Our new study used a larger number of additional 
variables related to academic performance (such as gender, ethnicity, poverty, motivation and prior 
attainment) to enhance the model and hence predict examination performance at age 14. At both Key 
Stages 2 and 3, pupils who received free school meals performed significantly lower than their peers. A 
similar pattern of results was obtained for performance in the Cognitive Ability Tests. With regard to 
academic performance and school attendance, receipt of free school meals appears to be more of a 
barrier to academic success than gender. A similar pattern emerges for other variables related to family 
income. 

4.3 Multiple regression analysis was used to predict performance in Key Stage 3 examinations, and 
indicate pupils who may be underachieving. The initial model was run with all of the variables linked to 
academic achievement. In this analysis, the following variables accounted for 83% of the variance in the 
examination outcome at Key Stage 3 - Prior Attainment, Attendance, Self-concept (reading). Free 
School Meals, School factor. Self-concept (general). Gender, Family type. Parent’s evening. Month of 
birth. Sibling order. Working mother. Parental Involvement One of the most striking findings from this 
analysis was that gender explained less than 1% of the variance. This was lower than the levels of 
variance attributed to receipt of free school meals, attendance, attitude towards reading and, of course, 
prior attainment (a composite variable comprising Key Stage 2 and CAT scores, NFER 2002). When 
considered alongside the findings for low achievement, this result suggests that the intact of gender on 
overall achievement might not be as significant as was once thought 

4.4 The analysis was repeated with the predictive model, whereby the prediction was made that the 
underachieving group would comprise mainly working class boys. Consequently, both gender and social 
class were omitted from the regression analysis. The results show that 82% of the variance in Key Stage 
3 performance could be accounted for by the variables entered into the model (minus gender and the 
social class variables). The distribution of pupils’ actual and predicted Key Stage 3 score was such that 
definite outliers could be identified. As such, pupils whose actual score lay more than one standard 
deviation below what was predicted from the model were termed as ‘underachievers’, pupils whose 
actual score was more than one standard deviation higher than predicted were termed ‘overachievers’. 
The distribution of ‘underachieving’ and ‘overachieving’ boys according to social class was very similar 
(Table 3). There was certainly no suggestion in this study that working class boys were underachieving. 

Table 3 - Social class of boys in the overachieving (OA) and underachieving (UA) gro ups 



Boys 


Service 


Intermediate 


Working 


Unpaid 


% 


% 


% 


% 


UA 


18.7 


21.6 


46.7 


1.3 


OA 


20.8 


22.2 


54.2 


2.8 



4.5 There are two particular features of this study that set it apart from previous work conducted in this 
area. First it is unusual in that it has considered underachievers from across the ability range; thus, it 
might be possible to have a high achieving underachiever (for example, someone who failed to convert 
their three level six outcomes at Key Stage 3, to level sevens), or a low achieving underachiever (for 
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example, someone who achieved the same lower levels at both Key Stages 2 and 3). Secondly and 
perhaps more cmcially, it involved the manipulation of a comprehensive set of background factors that 
have been cited in the literature as being closely linked to academic performance. The approach taken 
here has avoided focusing solely on the ‘psychologist’s’ use of the term ‘underachievement’ (as the 
mental ability test/school performance discrepancy), which characterises much of the research reported 
in this area. In contrast, the use of background factors, such as motivation, along with indicators of 
economic wellbeing in the identification of pupils who might be underachieving, has led to the 
formulation of a stricter definition of underachievement that is a considerable improvement upon many 
other studies 

4.6 Few differences were found between the male and female pupils’ scores on academic and 
contextual variables. For example girls were no more likely than boys to attend school regularly and 
have a higher score on the CAT or on end of Key Stage tests in maths and science. The only 
consistently diflFerent results were in the examination scores in English. This was in contrast to the results 
for pipils who received fiiee school meals, who were disadvantaged on each of the above outcomes, 
compared to pupils who were able to pay for their school meals. In the regression model that assessed 
the impact of each variable on examination perfoimance, the relationship with gender was once again 
very weak. Here, gender accounted for less than 1% of the variance in examination outcome. These 
findings have implications for commentators who place gender and the underachievement of boys in 
particular, at the heart of the standards debate and should provide a cautionary note for all those who 
perpetuate the binary notion of boys versus girls. 

4.7 Bringing together the results fiom the analysis of low achievers and underachievers it would seem 
that many of the pupils labelled popularly in the media as ‘underachievers’ should actually be labelled as 
low achievers. That the relatively poorer working class pupils generally do not do as well in school as 
their more affluent counterparts may well be the case, but there is litde to suggest that this group of 
pupils is w«fi?erachieving. Thus, we have two parallel concepts - low achievement and a stricter notion 
of underachievement - that can be flamed within the broader issue of achievement at group and 
individual levels. Indeed, what this study has suggested is that ^plying the ‘underachievement’ label to a 
diverse groiq) of individuals is incorrect and peih^s an alternative label should be sought. 



5. Implications 



5.1 One thrust of this paper has been to suggest that a consideration of standards or effectiveness is not 
a simple matter of counting and comparison (Gorard 2000d). Even where simplifying assumptions are 
made about the outcomes from schools, such as a concentration on statutory assessment and test 
results, philosophical and methodological difficulties persist. In light of these difficulties, there is certainly 
no evidence here of falling educational standards over time in Britain, no convincing evidence of 
underperformance relative to the educational systems of other developed nations, and no evidence of a 
highly polarised system. 

5.2 International and LEA-based comparisons suggest that comprehensive systems of schools based on 
parental choice tend to produce narrower social differences in intake and outcomes. Systems with more 
differentiation lead to greater gaps in attainment between social groups. Finland, for example, has a high 
average reading score, a small gap between high and low attainers and comprehensive schools and a 
policy of choice. Germany, on the other hand, has a much lower average reading score, a laige 
difference between high and low attainers, and a tiered system of selective schooling. The UK is 
currentiy still in a reasonable comparative position, with a high average reading score, below average 
differences between high and low attainers, and comprehensive schools with a policy of (limited) choice. 
The lessons for current policy are obvious. 

5.3 However, not all commentators are aware of this. There is a common crisis account of the position 
of UK schooling (and there is, peffiaps, a tendency for all commentators to decry the position of their 
own countries). For example, Johnson (2002) recentiy complained that ‘British students may be among 
the world’s highest achievers, as the recent Organisation for Economic Co-operation and 
Development’s PISA study found. But the achievement gap between social classes remains one of the 
biggest in the world’ (p.23). This represents the view of the IPPR - an influential centre-left think tank. 
The introduction of choice policies have, according to this account, led to a greater polarisation of 
results. But this greater polarisation by parental occupation does not exist in the UK (Johnson has 
simply misread the data). Something that does not exist cannot, therefore, be the result of choice 
policies. 

5.4 There is no particular urgency about the issue of differential attainment by gender (certainly no more 
so than around 1988 when the only big increase took place) or by any other physiological group, and 
there may be many hidden dangers in tinkering with a school system already near 'initiative-overload'. 
Both the action research approach and the transfer of ‘successful’ school strategies for raising the 
attainment of boys is both wasteful and unnecessary (not to mention inequitable). Since the current gap 
has existed since 1988, is not growing, and is much smaller than other systematic gaps (such as those by 
background of student), we can take our time to search for ameliorative solutions if they are required. It 
might, for example, be much simpler to obtain gender neutrality through a reconsideration or redesign of 
the assessment system (whence the gap may have come), than through changes in classroom interaction 
(and similar comments apply to ethnicity). While still facing potential problems, on any rational 
comparison the UK school system is in the healthiest state ever. Raw score indicators of attainment are 
rising annually, gaps between social groups are reducing, and socio-economic segregation between 
schools has declined. We do not appear to need yet more major interventions to solve problems that do 
not exist and that detract from dealing with the problems that do. 
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5.5 Perhaps the most important conclusions to be drawn are negative ones. The fact that boys and girls 
perform the same at low levels of attainment (or indeed at all in some subjects), coupled with the relative 
stasis of the gender gap since 1989, suggests that many potential explanations are now unworkable. 
Any useful causal ejq)lanation would focus on high, not low, level attainment, and ^ggest an instant one- 
off impact. Notably therefore, this differential attainment is not the result of a cultural change in society, 
new methods of teaching, seating arrangements in schools, mixed-sex classes, boys’ laddishness, or 
poor attendance at school. This has serious implications for the conduct of future work, and for the 
validity of previous work, in this area. Longitudinal work with large-scale datasets has elucidated the 
overall pattern, while the action research and the transfer-of-successfiil-strategies approaches adopted 
by the DfES have been unhelpful at this stage. In fact, a considerable amount of public funding is being 
wasted in attempting to solve a specific problem of underachievement at school that does not actually 
exist. 

5.6 Another conclusion to be drawn fiom this would be that differential attainment by gender is a 
product of the changed system and nature of assessments rather than any more general failing of boys, 
their ability, application, or the competence of those who teach them. Such a conclusion - that 
differences are highly dependent on the nature of assessment - would be supported by the recent debate 
over the apparent improvement in boys’ literacy. This improvement was apparently the result of 
sensitivity to the precise nature of the test. It might, for example, be much simpler to obtain gender 
neutrality through a reconsideration or redesign of the assessment system (whence the gap may have 
come), than through changes in classroom interaction. Whatever ameliorative strategies are proposed, it 
would be preferable for them to be considered carefully in light of a fuller analysis of differential 
attainment than hitherto (especially through a consideration of the interaction of gender, ethnicity, 
poverty and so on). This should also be done with the full realisation that all such strategies may have 
longer term impacts on the lives of both men and women in adult society. 

5.7 Our value-added analyses of individual student performance data have caUed into question the 
underachievement of large groups of students. 

5.8 The use of school improvement models has led, indirectly, to an overemphasis on the most visible 
indicators of schooling - examination and test scores. There is a considerable danger of targets, based 
on these indicators, determiiiing the practice of organisations. The use of test scores leads to three 
related problems. It may marginalise other purposes and potential benefits of schooling. In addition, it 
suggests that variations in the scores themselves are the product of school effects when the evidence 
clearly shows otherwise. It also neglects the fact that the scores themselves are artificial, and technically 
difficult to con^are fairly over time or place. Our current examination system was designed to 
differentiate between candidates. If it did not do so, it would be rejected, presumably, as ineffective. 
We cannot, logically, use this differentiation per se as evidence of underachievement. 

5.9 All of the findings from the kind of studies described here, and smaller-scale studies of classroom 
processes, will remain open to dispute until a decision is made to demand definitive experimental testing 
of the possible determinants of achievement. Meanwhile, we are often left with mere pseudo- 
explanations such as sex, region or social class. Even if we could show that sex was a cause of a level 
of achievement, we could not adjust the sex of individuals to ameliorate low achievement. Even if we 
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find that achievement varied by region, we would be foolish to believe that transplanting populations 
between regions would be practical or effective (and so on). This is what we mean by ‘pseudo- 
explanations’. 

5.10 General improvements in the standards of, and outcomes from, education appear to be reducing 
the educational inequalities between different social groups and geographical regions. Although 
inequality and injustice for the socially disadvantaged has always existed (MacKay 1999), in fact, 'if you 
take a long-term historical perspective of the provision of education in the UK throughout its entire 
statutory period... you could say that a constant move towards greater justice and equity has been the 
hallmark of the whole process' (p.344). If so, good. 
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